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Background: Autoantibodies have been detected in sera before diagnosis of cancer leading to interest in their potential as 
screening/early detection biomarkers. As we have found autoantibodies to MUC1 glycopeptides to be elevated in early-stage 
breast cancer patients, in this study we analysed these autoantibodies in large population cohorts of sera taken before cancer 
diagnosis. 

Methods: Serum samples from women who subsequently developed breast cancer, and aged-matched controls, were identified 
from UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) and Guernsey serum banks to formed discovery and 
validation sets. These were screened on a microarray platform of 60mer MUC1 glycopeptides and recombinant MUC1 containing 
16 tandem repeats. Additional case-control sets comprised of women who subsequently developed ovarian, pancreatic and lung 
cancer were also screened on the arrays. 

Results: In the discovery (273 cases, 273 controls) and the two validation sets (UKCTOCS 426 cases, 426 controls; Guernsey 303 
cases and 606 controls), no differences were found in autoantibody reactivity to MUC1 tandem repeat peptide or glycoforms 
between cases and controls. Furthermore, no differences were observed between ovarian, pancreatic and lung cancer cases and 
controls. 
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Conclusion: This robust, validated study shows autoantibodies to MUC1 peptide or glycopeptides cannot be used for breast, 
ovarian, lung or pancreatic cancer screening. This has significant implications for research on the use of MUC1 in cancer detection. 



Early detection remains the most promising approach to improving 
survival of cancer patients. In breast cancer, mammographic 
screening significantly impacts on mortality (Kerlikowske et al, 
1995), although controversy exists as to possible overdiagnosis 
(Gotzsche and Nielsen, 2011). Use of serum biomarkers for the 
early detection of cancer, before development of clinical symptoms, 
is an attractive goal being minimally invasive and potentially highly 
cost-effective. Screening for autoantibodies rather than the antigens 
may improve sensitivity as substantial tumour mass may be required 
before the antigen can be detected in serum, whereas autoantibodies 
act as biological amplifiers increasing the detectable signal from the 
antigen. Indeed, specific autoantibodies have been reported to be 
present in sera of patients before clinical diagnosis of cancer (Lubin 
et al, 1995; Li et al, 2005; Zhong et al, 2006; Desmetz et al, 2011; 
Chapman et al, 2012; Lu et al, 2012; Pedersen et al, 2013) and are 
under trial for the detection of lung cancer (Chapman et al, 2012). 

The antigen MUCl is upregulated in breast and other cancers, 
and is also aberrantly glycosylated, adding another dimension to the 
cancer specificity. The mucin carries large numbers of O -linked 
glycans which in breast cancer are truncated, resulting in the 
appearance of cancer-specific glycopeptide epitopes, which are 
antigenically distinct (Sorensen et al, 2006; Tarp et al, 2007; 
Wandall et al, 2010). Using a novel O-glycopeptide/glycoprotein 
array-based assay detecting IgG antibodies, we have recently shown 
that autoantibodies reactive with the cancer-associated glycopeptide 
epitopes can be detected in sera from 30% of early breast cancer 
patients (Blixt et al, 2011). Moreover, high levels of autoantibodies 
were significantly associated with reduced risk of relapse and 
increased time to metastasis (Blixt et al, 2011). These encouraging 
results led us to explore whether autoantibodies to tumour- 
associated glycoforms of MUCl could be used as a serum biomarker 
for detection of breast and other cancers before clinical diagnosis. 

With a few exceptions that used prospective sera collections 
(Pinheiro et al, 2010; Chapman et al, 2012; Lu et al, 2012; Pedersen 
et al, 2013), most serum biomarker discovery studies for early 
detection of cancer have been carried out on sera collected from 
patients at diagnosis, (Chapman et al, 2007; Zhong et al, 2008; 
Boyle et al, 2011; Lacombe et al, 2013) or involved small cohorts 
with no independent vaUdation (Lubin et al, 1995; Li et al, 2005; 
Robertson et al, 2005; Zhong et al, 2006; Pereira-Faca et al, 2007). 
Here, we report on MUCl glycopeptide microarray analysis of 
serum samples from over 2000 women from the general 
population in nested breast cancer case-control studies involving 
two prospectively collected serum banks of initially healthy 
women: the UK Collaborative Trial of Ovarian Cancer Screening 
(UKCTOCS) with 202 638 women recruited between 2001-2005 
(Menon et al, 2009) and the Guernsey island serum bank with 6500 
women recruited between 1977-1991 (Fentiman et al, 2006). 
Complete follow-up for cancer and mortality was available for both 
cohorts. Moreover, it was possible to include an additional control 
group from the Guernsey cohort that consisted of women who had 
not developed any form of cancer up to 32 years (range 18-32) 
after serum donation. As MUCl is expressed by most adenocarci- 
nomas, we were also able to screen sera from apparently healthy 
women in the UKCTOCS bank who later developed lung, 
pancreatic or ovarian cancer and controls. 

This robust, validated study reported here, which has been 
carried out in accordance with STARD guidelines, is important as 
considerable effort and resources are being focused on the analyses 
of autoantibodies for early cancer detection and MUCl a 
commonly used antigen. 



MATERIALS AND METHODS 



Subjects. The cases and controls were identified from two cohorts 
(UKCTOCS and Guernsey) of women who were clinically healthy 
at recruitment. Serum samples from individuals participating in 
the multimodal arm of UKCTOCS trial (Menon et al, 2009) were 
included. In this trial, 50 640 women were randomised to the 
multimodal group between 2001 and 2005, and donated samples 
annually until 2011. All women were followed via electronic 
flagging for cancer registration and death through the Medical 
Research Information System in England and Wales and the 
Central Services Agency and Cancer Registry in Northern Ireland. 
The volunteers consented to use of their serum samples in further 
secondary studies and all exceptions to this were recorded and 
honoured. This current study was approved by the joint University 
College London (UCL) /University College London Hospital 
(UCLH) Committees on the Ethics of Human Research (Commit- 
tee A; Ref 05/Q0505/57) on the 7th February 2008. Controls were 
women from the same trial centre who had no history of any 
cancer at last follow-up, and who had donated serial serum 
samples during the same period. 

The Guernsey cohort consists of 6500 women aged 35 and 
over living on the island of Guernsey who were recruited between 
1977-1991 (Fentiman et al, 2006). All women donated a serum 
sample at recruitment and underwent mammographic screening. 
Women were followed up by regular visits to Guernsey to access the 
hospital records and obtain copies of all female death certificates. 
Information was also received from the South West Cancer Registry 
for female Guernsey patients treated in Southampton (mainland 
UK). Thus, all cases of cancer have documentation by pathology 
report or death certificate, and occasionally radiology reports. 
Additionally, checks were made at the island registry (The Greffe) 
for changes of name through marriage or deed poll. Written 
informed consent was obtained from each volunteer. This consent 
covered use of the serum for the investigations of cancer biomarkers 
and access to the women's medical records. Ethical approval to allow 
the access to patients' medical records of the volunteers who donated 
sera to the Guernsey bank was obtained (Guernsey and Alderney 
Ethical Committee). 

Samples 

Breast cancer. Discovery sample set: Sera from the UKCTOCS 
women who went on to develop invasive breast cancer were 
identified. Women with previous cancer history at recruitment 
were excluded. The cases were matched to controls (healthy 
women with no notification of cancer when the case was identified) 
1 : 1 on age at donation ( ± 1 year) and length of frozen storage 
( ± 1 year). 

Validation sample sets: UKCTOCS case-control set: Sera from 
women who developed invasive breast cancer after randomisation 
to UKCTOCS and sera donation (not included in the discovery set, 
no previous cancer history and had physician -confirmed breast 
cancer with data on histological subtype and either stage/grade or 
both) were included. These were matched to controls, women from 
UKCTOCS with no cancer history either at recruitment or when 
the case was identified, on age ( ± 1 year), length of storage 
( ± 1 year) and trial centre. 

Guernsey case-control set: Sera were identified from women who 
developed breast cancer up to 30 years post donation. Cases were 
matched to two sets of controls: (1) women who had no diagnosis 
of cancer at the time the case was diagnosed; (2) women who had 
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not developed cancer during follow-up (range 18-32 years) on age 
( ± 1 year) and date of sample collection ( ± 1 year). 

Serum storage: UKCTOCS; all samples were stored in liquid 
nitrogen since collection. The aliquot used for this analysis had 
never been previously freeze thawed. Once the aliquot was thawed, 
it was divided into smaller aliquots and refrozen. Guernsey; all 
sample were stored aliquoted at — 20 °C and the aliquot used for 
this analysis had never been freeze thawed. Once an aliquot was 
thawed, it was divided into smaller aliquots and refrozen. 

Ovarian, lung and pancreatic cancer. Sera from UKCTOCS 
women who developed ovarian, lung and pancreatic cancers 
following randomisation to the trial were identified. Controls were 
healthy trial participants who did not have a notification of cancer 
at the time the case samples were identified. Cases were matched 
1 : 1 to controls on the basis of age at donation ( ± lyear) and time 
in freezer ( ± lyear). 

Microarray autoantibody assay. Glycopeptides and recombinant 
glycoprotein: Synthetic 60mer MUCl peptides corresponding to 3 
twenty amino -acid tandem repeats and MUC2 peptides were 
synthesised and glycosylated in preparative scale using recombi- 
nant enzymes produced in insect cells (Tarp et al, 2007; Wandall 
et al, 2010; Blixt et al, 2011), see Table 1 for a list of the 
glycopeptides, their glycan structure and sites of glycosylation. As 
with our previous study (Blixt et a/, 2011) this study confirmed that 
the use of 20mers (one tandem repeat) or 60mers (three tandem 
repeats) gave comparable results (see Supplementary Figure 1). All 
structures were purified by preparative HPLC and analysed by 
MALDI-TOF as described (Tarp et al 2007). Recombinant MUCl- 
based glycoproteins carrying ST and T were produced in CHOKl 
cells as described by Backstrom et al (2003) and those without 
O -linked glycans or carrying the Tn glycan were produced in CHO 
IdlD cells. 

Microarrays: Glycopeptides arrays were custom printed by 
Arraylt (Sunnyvale, CA, USA) onto Schott Nexterion Slide H 
(Schott AG, Mainz, Germany) with 16 arrays per slide. Each 
peptide or glycopeptide was printed (0-5 nl) in triplicate and at 
three concentrations (50, 25 and 12-5 /lu) and each recombinant 
protein at 250, 125 and 62-5 pg. The quality control of printed 
glycopeptides was visualised by staining with glycoform- specific 
lectins and antibodies as described previously (Blixt et al, 2011). 
Human IgG was also printed as a positive control for the second 
antibody and to orientate the arrays. 

Sera were diluted 1 : 50 and the arrays screened as described by 
Blixt et al (2011). The slides were scanned in a PerkinElmer 
Scanarray and the images quantified with ProScanArray Express 
software programme (PerkinElmer, Cambridge, UK). Spots were 



identified using automated spot finding or manual adjustments for 
occasional irregularities. 

All samples were screened in duplicate with blinding as to case 
or control. The same positive control serum from a breast cancer 
patient from the cohort used in our previous paper (Blixt et al 
2011) was used on every slide and where possible, cases and their 
controls were screened on the same slide. Sera were rescreened if 
the duplicates did not agree based on a similarity measure between 
them (see Supplementary Methods). If there was still inadequate 
agreement after the rescreen, the samples were removed from the 
analyses. 

Statistical analysis. In order to quantitatively detect any difference 
between distributions, the data were split into quartiles. The 
quartile division was performed on the entire set of data with no 
information regarding the grouping of samples into cancer cases or 
controls. The null hypothesis was that the samples would be 
distributed randomly over quartiles. A x^-test was performed to see 
if there was a significant difference between the numbers of 
samples in each quartile. This test was chosen to determine 
whether differences between the two groups exist. 

In addition, using two s.d. values from the mean of the control 
sera for each antigen as cutoff, the fraction of autoantibody positive 
sera were compared between cases and controls in the two 
validation sets. Receiver operator characteristic (ROC) curves were 
constructed for each of the MUCl antigens on the arrays and by 
giving equal weight to all features a generalised ROC curve was 
formed. 



RESULTS 



Sample selection. From the UKCTOCS, 258 women who went on 
to develop invasive breast cancer up to 4 years following sample 
donation were identified for the discovery set. Eighteen women 
were ineligible because of a previous history of breast (12) or other 
cancer (6) at randomisation. From the remaining 240 cases, 273 
serum samples were available meaning that 33 women donated two 
serum samples at different times prediagnosis. Analysis of these 
duplicate samples from the same woman showed no significant 
differences between the values (data not shown). There were 273 
samples from 273 control women. There was no significant 
difference in baseline characteristics between cases and controls 
(Table 2A) although there was a trend for increased weight in the 
cases at randomisation, a known risk factor for breast cancer. All 
women were postmenopausal. 

The UKCTOCS validation set included a single serum sample 
from each of 431 cases and 431 controls. The Guernsey set 



Table 1. Structure of the ML 


CI glycopeptides used on the arrays 








Name 


Peptide backbone 


Glycan 


Sites per repeat 


MUC1 unglycosylated 


(VTSAPDTRPAPGSTAPPAHG)3 


N/A 


N/A 


MUC1 unglycosylatedRec 


16 tandem repeats plus the amino terminus expressed in CHO cells 


N/A 


N/A 


MUC1core3 


(VTSAPDTRPAPGSTAPPAHOs 


GlcNAcpi,3GalNAca- 


5 


MUCISTn 


(VTSAPDTRPAPGSTAPPAHOa 


Neu5Aca2,6GalNAca- 


5 


MUC1T 


(VTSAPDTRPAPGSTAPPAHOs 


Gaipi,3GalNAca- 


5 


MUCITRec 


16 tandem repeats plus the amino terminus expressed in CHO cells 


Gaipi,3GalNAca- 


Average of 4.3 


MUCITn 


(VTSAPDTRPAPGSTAPPAHG)3 


GalNAca- 


5 


MUCITnRec 


16 tandem repeats plus the amino terminus expressed in CHO cells 


GalNAca- 


Average of 3.8 


MUCISTRec 


16 tandem repeats plus the amino terminus expressed in CHO cells 


Neu5Aca2,3Gal(313GalNAca- 


Average of 4.3 


MUC2Core3 




GlcNAcpi,3GalNAca- 


11 


MUC2Tn 




GalNAca- 


11 
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Table 2. Baseline characteristics of the cohorts used in the study 



(A) Baseline characteristics of UKCTOCS cohorts used for the discovery and validation studies 



Discovery set, median (25th-75th centiles) 



Validation set, median (25th-75th centiles) 



Controls 
(N=273) 



Breast cancer 
cases (N=240) 



Controls and 
BC (N = 513) 



Controls (N = 431) 



Breast cancer 
cases (N=431) 



Controls and BC 
(N=862) 



Age (years) at randomisation 



60 (55-65) 



60 (56-66) 



60 (56-65) 



61 (55-66) 



61 (57-66) 



61 (57-66) 



Years since last period at 
randomisation 

Duration of HRT use in those 
who were on HRT at 
randomisation (years) 

Duration of OCR use (years) 
in those who had used it 



11 (5-19) 
8 (5-12) 

6 (3-10) 



10 (5-17) 
8 (5-11) 

6 (2-10) 



11 (5-18) 
8 (5-12) 

6 (3-10) 



12 (6-18) 11 (5-18) 

8 (5-12) 11 (6-13) 



5 (2-10) 



5 (2-10) 



11(6-18) 
10 (5-13) 

5 (2-10) 



Height (cms) 



163 (156-165) 



163 (158-168) 



163 (158-168) 



163 (158-165) 



163 (158-168) 



163 (158-168) 



Weight (kg) 



67 (60-76) 



69 (63-76) 



! (61-76) 



66 (60-76) 



1 (61-76) 



68 (60-76) 



Ethnicity 




Number (%) 


1 




Number (%) 




White 
Other 
Missing 


262 (95 • 8%) 
9 (3 • 3%) 
2(0 • 8%) 


234 (97 • 5%) 
6 (2 • 5%) 
0 (0%) 


496 (96.7%) 
15 (2-9%) 
2 (0-4%) 


417 (96-6%) 
13 (3-0%) 
1 (0-2%) 


426 (98 • 8%) 
4 (0 • 9%) 
1 (0-2%) 


843 (97 • 8%) 
17 (2-0%) 
2 (0 • 2%) 


Hysterectomy 


51 (19-2%) 


46 (19-2%) 


97 (18.9%) 


75 (17-4%) 


83 (19-3%) 


158 (18-3%) 


Ever use of OCR 


144 (53-3%) 


138 (57.5%) 


282 (55.0%) 


258 (59 • 8%) 


276 (64-0%) 


534 (62-0%) 


Use of HRT at recruitment 


70 (26.3%) 


55 (23-0%) 


125 (24-6%) 


72 (16-7%) 


142 (33-0%) 


214 (24-9%) 


Women having 1 or more pregnancies 


235 (85 • 4%) 


203 (84 • 6%) 


438 (85-4%) 


388 (90.0%) 


374 (86.8%) 


762 (88.4%) 


Women having 1 or more miscarriages 


73 (25 • 8%) 


62 (25 • 8%) 


135 (26.3%) 


114 (26.55) 


131 (30.4%) 


245 (28.4%) 



(B) Baseline characteristics of breast cancer cases and controls from the Guernsey cohort used in the Validation study 

I Median (25th-75th centiles) 



Controls (N = 664) 



Breast cancer cases 
(N = 332) 



Controls and BC (N = 996) 



Age (years) at serum donation 



50 (42-58) 



50 (43-57) 



50 (42-58) 



Duration of HRT use in those who were on HRT at time 
of serum donation (years) 



7(3-20) 



11 (4-30) 



! (3-23) 



Duration of OCR use (years) in those who had used it 
Height (cms) 



3.8 (1-7.3) 
160 (155-164) 



4(1-9.6) 
161 (157-165) 



3.9 (1-9.6) 
160 (156-165) 



Weight (kg) 



64 (57-70) 



64 (60-72) 



64 (58-71) 



Number (%) 



Ethnicity: Missing 



664 (100%) 



332 (100%) 



996 (100%) 



Menopausal status 



Rre 
Reri 
Rost 

Hysterectomy 



259 (39-0%) 

54 (8-1%) 
249 (37 • 5%) 
102 (15-4%) 



114 (34-3%) 
22 (6 • 6%) 

151 (45-6%) 
45 (13-6%) 



373 (37 • 5%) 

76 (7 • 6%) 
400 (40-2%) 
147 (14-6%) 



Use of oral contraceptive pill 



Ever 

Unknown 



94 (14-1%) 
439 (66-1%) 



51 (15-4%) 
221 (66-6%) 



145 (14-6%) 
660 (66 • 3%) 



Use of HRT at recruitment 



36 (5 • 4%) 



20 (6-0%) 



56 (5 • 6%) 



Women having 1 or more pregnancies 



560 (84 • 3%) 



288 (86 • 7%) 



848 (85-1%) 



Women having 1 or more miscarriages 



144 (21 -7%) 



86 (25 • 9%) 



230 (23-1%) 



Abbreviations: HRT = hormone replacement therapy; OCR = oral contraceptive pill; UKCTOCS = UK Collaborative Trial of Ovarian Cancer Screening. 
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X 10^ 



< 104 



X 104 



MUC1 unglycosylated.50 
X 104 



MUC1 unglycosylatedRec.250 
X 104 



MUC1Core3.50 



c 104 



MUCISTn.50 



c 104 



: 104 



MUC1-TRec.250 



< 104 



MUCITn.50 



c 104 



MUC1-TnRec.250 



c 104 



MUC1-STRec.250 



c 104 



MUC2Core3.50 



< 104 



MUC2Tn.50 



MUC1 unglycosylated.25 



c 104 



MUC1 unglycosylatedRec.125 
>< 104 



MUC1Core3.25 



< 104 



MUC1STn.25 



: 104 



: 104 



MUC1-TRec.125 



: 104 



MUC1Tn.25 



c 104 



MUC1-TnRec.125 



: 104 



MUC1-STRec.125 



: 104 



MUC2Core3.25 



c 104 



MUC2Tn.25 



MUC1 unglycosylated.12.5 
: 104 



MUC1 unglycosylatedRec.62.5 
X 104 



MUC1Core3.12.5 



: 104 



MUC1STn.12.5 



: 104 



< 104 



MUC1-TRec.62.5 



: 104 



MUC1Tn.12.5 



< 104 



MUC1-TnRec.62.5 



c 104 



MUC1-STRec.62.5 



: 104 



MUC2Core3.12.5 



c 104 



MUC2Tn.12.5 



Figure 1. Autoantibodies to MUC1 in sera fronn wonnen who subsequently developed breast cancer and nnatched controls. Dot blots showing 
the reactivity of autoantibodies present in the discovery sera from women who went on to develop breast cancer (red dots, n = 273) and controls 
(blue dots, n = 273), from the UKCTOCS discovery set. The peptide, glycopeptides and glycoproteins (Rec) present on the arrays are indicated 
beneath each dot plot. The numbers (50, 25, 12-5 and 250, 1 25, and 62 • 5) refer to the three concentrations spotted onto the arrays in fiM for the 
peptide and glycopeptides, and in pg for the recombinant glycoproteins. 



included sera from 332 women who were later diagnosed with 
breast cancer, together with 664 age-matched controls (332 who 
did not have any type of cancer when their matched case was 
diagnosed and 332 who were alive and without cancer after up to 
32 years follow-up (range 18-32 years). There was no difference in 
baseline characteristics between cases and controls for the 862 
women in the UKCTOCS and 996 Guernsey cohorts in the 
validation set (see Table 2 A and B respectively). The median age of 
the UKCTOCS cohort was 61 (IQR: 57-66) and all women were 
postmenopausal. 



Time to diagnosis of breast cancer. In the UKCTOCS discovery 
set, the cases all donated sera up to 4 years before clinical diagnosis 
of breast cancer, 94% (257) preceding cancer diagnosis by 3 years 
or less. For the validation set, 95% (406 samples) of the breast 
cancer cases identified from the UKCTOCS cohort donated serum 
up to 4 years before clinical cancer diagnosis. In the Guernsey set, 
25% of samples preceded cancer diagnosis by 6 months to 5 years 
with a further 27% collected 5-10 years before diagnosis. 
Supplementary Table 1 details the subtype, stage and grade of 
the tumours in women diagnosed with breast cancer. 
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Table 3. Reactivity of discovery set and validation set sera from women 
who subsequently developed breast cancer (cases) and controls 



Table 3. (Continued) 



MUC1 peptide/ 
glycopeptide antigen 


Samples 


Q1 


Q2 


Q3 


Q4 


(A) Discovery set 


MUC1 peptide unglycosylated 
(60mer) 


Cases 

Controls 
P-value 


66 

70 
0-7S2 


70 

67 
0-798 


61 

75 
0-2S0 


76 

61 
0-200 


MUCIcoreS glycopeptide (60mer) 


Cases 

Controls 

P-value 


6S 
7S 
0-S91 


6S 
7S 
0-S91 


66 
71 
0-670 


81 
56 
0-0S27 


MUCISTn glycopeptide (60nner) 


Cases 

Controls 

P-value 


67 
69 
0-864 


68 
68 
1 


60 
77 
0-146 


78 
59 
0-105 


MUCITn glycopeptides (60mer) 


Cases 

Controls 

P-value 


62 
74 
0-SOS 


60 
77 
0-146 


69 
67 
0-864 


82 
55 
0-0210 


MUC1T glycopeptides (60mer) 


Cases 

Controls 

P-value 


69 
67 
0-864 


60 
77 
0-146 


70 
66 
0-7S2 


74 
6S 
0-S47 


MUC1T CHO recombinant (16 
tandem repeats) 


Cases 

Controls 
P-value 


64 

72 
0-492 


68 

69 
0-9S2 


68 

68 
1 


7S 

64 
0-442 


MUCITn CHO recombinant (16 
tandem repeats) 


Cases 

Controls 
P-value 


58 

78 
0-086 


70 

67 
0-798 


70 

66 
0-7S2 


75 

62 
0-267 


MUC1ST CHO recombinant (16 
tandem repeats) 


Cases 

Controls 
P-value 


62 

74 
0-SOS 


61 

75 
0-2S0 


7S 

64 
0-442 


77 

60 
0-146 


MUC2core3 glycopeptide 


Cases 

Controls 

P-value 


61 
74 
0-26S 


71 
67 
0-7SS 


71 
65 
0-607 


70 
65 
0-798 


(B) Validation set 


UKCTOCS cohort 












MUC1 peptide unglycosylated 
(60mer) 


Cases 

Controls 

P-value 


109 
104 
0-7S2 


108 
105 
0-8S7 


106 
107 
0-945 


10S 
110 
0-6S1 


MUCIcoreS glycopeptides (60mer) 


Cases 

Controls 

P-value 


108 
105 
0-8S7 


97 
116 
0-19S 


114 
99 
0-S04 


107 
106 
0-945 


MUCISTn glycopeptides (60mer) 


Cases 

Controls 

P-value 


10S 
110 
0-6S1 


108 
105 
0-8S7 


110 
10S 
0-6S1 


105 
108 
0-8S7 


MUCITn glycopeptides (60mer) 


Cases 

Controls 

P-value 


115 
98 
0-244 


9S 
120 
0-064 


105 
108 
0-8S7 


IIS 
100 
0-S7S 


MUC1T CHO recombinant 
(16 tandem repeats) 


Cases 

Controls 

P-value 


105 
108 
0-8S7 


IIS 
100 
0-S7S 


96 
117 
0-150 


112 
101 
0-451 


MUCITn CHO recombinant 
(16 tandem repeats) 


Cases 

Controls 

P-value 


10S 
110 
0-6S1 


10S 
110 
0-6S1 


107 
106 
0-945 


IIS 
100 
0-S7S 


MUC1ST CHO recombinant 
(16 tandem repeats) 


Cases 

Controls 

P-value 


115 
98 
0-244 


106 
107 
0-945 


98 
115 
0-244 


107 
106 
0-945 


Guernsey ^Ulit^ 


MUC1 peptide unglycosylated 
(60mer) 


Cases 
Control 1 
Control 2 
P-value 


74 
77 
76 
0-806 


66 
69 
92 
0-796 


78 
86 
64 
0-5S2 


85 
71 
71 
0-262 


MUCIcoreS glycopeptides (60mer) 


Cases 
Control 1 
Control 2 
P-value 


77 
78 
72 
0-9S6 


68 
79 
80 
0-S64 


84 
76 
68 
0-527 


74 
70 
8S 
0-7S9 



MUC1 peptide/ 
glycopeptide antigen 


Samples 


Q1 


Q2 


Q3 


Q4 


MUCISTn glycopeptides (60mer) 


Cases 
Control 1 
Control 2 
P-value 


79 
72 
76 
0-569 


78 
78 
71 
1 


72 
77 
79 
0-682 


74 
76 
77 
0-870 


MUCITn glycopeptides (60mer) 


Cases 
Control 1 
Control 2 
P-value 


7S 
81 
7S 
0-519 


72 
79 
76 
0-569 


76 
79 
7S 
0-809 


82 
64 
81 
0-1S6 


MUC1T CHO recombinant (16 
tandem repeats) 


Cases 

Control 1 
Control 2 
P-value 


8S 

70 
74 
0-29S 


74 

80 
7S 
0-629 


71 

74 
8S 
0-80S 


75 

79 
7S 
0-240 


MUCITn CHO recombinant (16 
tandem repeats) 


Cases 

Control 1 
Control 2 
P-value 


84 

75 
68 
0-475 


75 

82 
70 
0-576 


66 

82 
80 
0-188 


78 

64 
85 
0-240 


MUC1ST CHO recombinant (16 
tandem repeats) 


Cases 

Control 1 
Control 2 
P-value 


78 

75 
74 
0-808 


81 

70 
76 
0-S70 


74 

71 
8S 
0-80S 


70 

87 
70 
0-175 


Abbreviation: UKCTOCS = UK Collaborative Trial of Ovarian Cancer Screening. Cases and 
controls were divided into quartiles dependent on the reactivity of their sera with the 
indicated antigens. 



Screening of discovery set. The detailed structures of the MUCl- 
based glycopeptides peptide and glycoproteins used in the 
microarray for screening the discovery set are listed in Table 1, 
and are based on the glycoforms used to detect reactive 
autoantibodies in sera from early- stage breast cancer patients 
(Blixt et al, 2011). The results are shown as a dot plot in Figure 1, 
and it can be seen that only two out of 273 samples from women 
from the breast cancer cases gave a positive reaction with 
unglycosylated recombinant MUCl (16 tandem repeats). 

To statistically analyse the data, we investigated the distribution 
within quartiles (see Methods for description of quartiles). There 
was no significant difference in distribution of autoantibodies to 
MUCl glycoforms between cases and controls over quartiles of 
reactivity (Table 3). There was, however, a trend for more cases 
than controls to be in highest quartile (Q4) for MUClcore3, 
MUCISTn and MUCITn (see Table 3A). While a number of sera 
in cases and control groups contained antibodies reactive to core3 
or Tn when carried on MUCl, little reactivity was seen with MUC2 
carrying these glycans indicating that the epitopes recognised 
consisted of the glycans and the MUCl backbone (Figure 1). 

As we are hypothesising that the presence of autoantibodies to 
aberrant glycoforms of MUCl is an antigen driven immune 
response arising from a clinically undetectable tumour, and as 
autoantibodies to other antigens such as p53 in colon cancer 
(Pedersen et al, 2013) and in lung cancer (Lubin et al, 1995; Li et al, 
2005), and alpha-fetoprotein in hepatic cancer (Zhang and Tan, 
2010) can only be detected within 3 years of cancer diagnosis, we 
investigated if the presence of antoantibodies to MUCl glycoforms 
is associated with breast cancer development in cases who 
developed breast cancer within 3 years of donating sera. The cases 
were stratified into cohorts who developed breast cancer within 
1 year, 1-2 years and 2-3 years of sera donation. Table 4A shows 
that even when sera were taken 1 year or less before breast cancer 
was diagnosed, there was no significant difference between the 
presence of autoantibodies to MUCl VNTR peptide or MUCl 
glycopeptides in the cases compared with the age-matched controls. 
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Table 4. Comparison of autoantibodies to MUC1 glycoforms in cases of breast cancer taken up to 3 years before diagnosis vs controls 



(A) Discovery set 



MUC1 glycoforms P-value breast cancer cases vs controls 



Time to 
diagnosis 
in years 


No. of 

sera 


MUC1 
ungly 


MUC1 
unglyRec 


MUC1 cores 


MUCISTn 


MUC1T 


MUCITRec 


MUCITn 


MUCITnRec 


MUCISTRec 


0-1 


93 


0.245 


0.598 


0.124 


0.374 


0.362 


0.560 


0.171 


0.418 


0.299 


1-2 


90 


0.121 


0.409 


0.024 


0.311 


0.373 


0.359 


0.320 


0.453 


0.204 


2-3 


74 


0.347 


0.110 


0.185 


0.583 


0.251 


0.322 


0.763 


0.961 


0.331 



(B) Validation set 



UKCTOCS cohort 



MUC1 glycoforms P-value breast cancer cases vs controls 



Time to 
diagnosis 
in years 


No. of 

sera 


if 


MUC1 
ungly Rec 


MUC1 cores 


MUCISTn 


MUCITRec 


MUCITn 


MUCITnRec 


MUCISTRec 


0-1 


133 


0.793 


0.340 


0.386 


0.842 


0.721 


0.789 


0.762 


0.799 


1-2 


87 


0.391 


0.661 


0.364 


0.458 


0.639 


0.887 


0.942 


0.591 


2-3 


94 


0.155 


0.763 


0.484 


0.472 


0.334 


0.435 


0.559 


0.477 



Guernsey cohort 



MUC1 glycoforms P-value breast cancer cases vs all controls 



Time to 
diagnosis 
in years 


No. of 

sera 


MUC1 
ungly 


MUC1 
ungly Rec 


MUC1 cores 


MUCISTn 


MUCITRec 


MUCITn 


MUCITnRec 


MUCISTRec 


0-3 


42 


0.594 


0.520 


0.957 


0.311 


0.834 


0.453 


0.527 


0.435 



Abbreviation: UKCTOCS = UK Collaborative Trial of Ovarian Cancer Screening. 



Screening of validation set. The coded sera were screened on the 
microarrays. Five samples from the UKCTOCS cases and 29 
samples from the Guernsey cases had to be removed from the 
analysis because the duplicates did not agree based on a similarity 
measure (described in Supplementary Methods), and rescreening 
the sera still showed disagreement. The relevant controls were also 
removed from the analysis. The final analysis, therefore, included 
426 UKCTOCS and 303 Guernsey cases samples, with their 
matched controls. Figure 2 shows a dot blot of the results obtained 
from both sera sets for MUClcore3 and MUCISTn, the two 
glycopeptides that gave the highest levels of antibodies in the 
discovery set. There was no difference in the percentage of sera 
showing MUClcore3 or MUCISTn autoantibodies between 
the cases and the controls from either serum bank (Figure 2) 
or between the Guernsey breast cancer cases and the controls 
who did not develop cancer within the extended follow-up period 
(18-32 years). 

The distribution of levels of autoantibodies in cases and controls 
over quartiles of reactivity also showed no significant differences 
between cases and controls in the two independent banks when 
analysed for autoantibodies to all glycopeptides, glycoproteins or 
unglycosylated MUCl peptides. Also, the trend observed in the 
discovery set of more cases in the highest (Q4) quartile 
of MUClcore3, MUCITn and MUCISTn was not observed 
(see Table 3B). Furthermore, a heat map analysis suggested no 
correlation was seen between the presence of autoantibodies and 
time to diagnosis (see Supplementary Figure 1). However, to 
analyse this in greater details we again stratified the samples from 
the UKCTOCS bank into those donated 0-1 years, 1-2 years and 
2-3 years before breast cancer diagnosis. As there were fewer 



samples from the Guernsey cohort with shorter times to diagnosis, 
we analysed as a single stratification samples taken 0-3 years 
before diagnosis. As can be seen from Table 4B there was no 
significant differences between the cases and controls in auto- 
antibodies even at 0-1 year preclinical diagnoses, in agreement 
with the data obtained with the discovery set. The Guernsey serum 
samples taken 1-3 years before diagnosis were compared with both 
sets of controls and again, no significant difference was obtained. 
For clarity, the results presented in Table 4B show the cases 
compared with the two sets of controls combined. 

In addition, ROC curves for each of the MUCl glycopeptides on 
the arrays fit the perfect diagonal and the areas under the curve did 
not significantly differ from 0.5 indicating that no distinction 
between the real data and data generated randomly could be made 
(see Figures 2E and F). Thus, autoantibodies to the MUCl 
glycopeptides cannot be used to distinguish cases from the 
controls. 

MUCl autoantibodies in ovarian, lung and pancreatic cancer. 

Eighty-nine serum samples taken from 86 women with ovarian 
cancer, preceding diagnosis by a mean of 1 year (IQR: 0-4-1 -5), 
123 sera taken from 123 women preceding lung cancer diagnosis 
by a mean of 1-6 years (IQR: 1 •0-2-2) and 35 samples taken from 
35 women preceding pancreatic cancer by a mean of 1 year (IQR: 
0- 8-2-0) and matched controls (247) were identified from the 
UKCTOCS serum bank. Baseline characteristics are presented in 
Supplementary Table 2, and tumour characteristics in 
Supplementary Table 3. The samples were screened on the 
glycopeptides arrays and there was no difference in autoantibodies 
to MUClcore3 and MUCISTn between cases and controls 
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5.9% 6.6% 5.6% Q 3.8% 4.9% 

x 104 (18) (20) (17) x 104 (16) (21) 

6 ■ •. 6 ■ ■ • •• : . 




0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 

1 -Specificity 1 -Specificity 



Figure 2. Autoantibodies to MUC1 glycopeptides do not distinguish breast cancer cases fronn controls. (A, B, C ,D) dot blots showing the 
reactivity of autoantibodies present in the validation sera from women who went on to develop breast cancer and controls. (A, B) Reactivity on 
SO fiM of MUC1 coreS glycopeptide; (C, D) Reactivity on 50 /^m of MUCISTn glycopeptide. (A, C) Sera were identified from the Guernsey serum 
bank who subsequently developed breast cancer (red dots, n = 303), matched controls who were not diagnosed with cancer at the time of 
diagnosis of the cases (blue dots, n = 303) and a second cohort of matched controls consisting of sera from 303 women who had not developed 
cancer up to 32 years after donation of blood (black dots). (B, D) A second cohort of sera identified from the UKCTOCS bank from women who 
subsequently developed breast cancer (red dots, n = 426) and matched controls (blue dots, n = 426). Percentages refer to the percentage of 
samples giving values higher than two s.d. values above the mean of the controls, and (n) refers to the number of women. (E, F) Receiver operating 
characteristics of individual and combined features for E, samples from the Guernsey bank and F, samples from UKCTOCS. Solid red lines 
represent the combination of all MUC1 antigens (see Table 1 for list of antigens) and dotted blue lines represent the individual antigens. 



(Figure 3). Although there appear to be more sera, which are 
positive for antibodies to MUCISTn and MUClcore3 in the 
control sera, this is because there are more control samples (247, 
see Supplementary Table 2), and there were only minor differences 
in rates of positivity between controls (see legend to Figure 3). 
Moreover, stratifying the samples into cohorts of 0-1, 1-2 and 
2-3 years before cancer diagnosis did not shown any difference 
between cases and controls (data not shown). 



DISCUSSION 



This is the largest case- control study that we are aware of 
exploring MUCl autoantibody profile before diagnosis of breast 
and other adenocarcinomas. No differences were observed in 
autoantibodies recognising MUCl tumour- associated glyco- 
peptides in the nested case-control study involving over 1000 
serum samples from women who later developed breast cancer and 
over 1300 matched controls in two independent cohorts (UKC- 
TOCS and Guernsey). This was irrespective of the time between 



serum donation and diagnosis of cancer with 273 
of the samples analysed being from women who were diagnosed 
with breast cancer within 1 year of serum donation. This result 
was totally unexpected as we have previously shown that 
autoantibodies to MUCl glycoforms can be detected in sera 
from early-stage breast cancer patients when the sera were taken 
at or just after the time of diagnosis (Blixt et at, 2011). 
Unfortunately, sera were not available at the time of diagnosis 
from the cases studied in the present paper. It should be noted 
that we did not assay autoantibodies on MUCl purified 
from tumours. However, such material is limited in quantities 
and it is very difficult to obtain homogeneous material that is 
standardised from one preparation to the next or from different 
individuals. 

Similar results were obtained for ovarian, lung and pancreatic 
cancer. Our findings suggest that detection of autoantibodies 
to MUCl VNTR peptides, or to glycopeptides and full-length 
glycoforms carrying cancer- associated glycans, cannot be used 
as a screening tool for early detection of these cancers in the 
general population. The results of this robust, validated large-scale 
prospective-specimen collection, retrospective-blinded evaluation 
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MUC1-STRec.125 



X 104 



MUC2Core3.25 



( 104 



MUC2Tn.25 



c 104 



MUC1Core3.12.5 



X 104 



MUC1STn.12.5 



X 104 



MUC1T.12.5 



X 104 



MUC1-TRec.62.5 



MUC1Tn.12.5 
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MUC1-STRec.62.5 
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Figure 3. Elevated levels or increased frequency of autoantibodies to MUC1 are not found in sera from ovarian, pancreatic or lung cancer 
patients before clinical diagnosis. Dot blots showing the reactivity of autoantibodies present in the sera of women who went on to develop lung 
cancer (green dots, n = 1 23), ovarian cancer (black dots, n = 89), pancreatic cancer (magenta dots, n = 35) or matched controls (blue dots n = 247). 
The peptide, glycopeptides and glycoproteins (Rec) present on the arrays are indicated beneath each dot blot. The numbers (50, 25, 12-5 and 
250, 125, and 62 • 5) refer to the three concentrations spotted onto the arrays in for the peptide and glycopeptides, and in pg for the 
recombinant glycoproteins. Positive samples were defined as samples giving values higher than two s.d. values above the mean of the controls 
and for MUC1core3 and MUCISTn were as follows: MUC1core3; controls 5.9% positive, pancreatic cancer 5.7% positive, ovarian 3.4% positive, 
lung 0% positive. MUCISTn; controls 2.3% positive, pancreatic 2.9% positive, ovarian cancer 0%. 



study have significant implications, as MUCl has been the focus of 
several studies aiming for early detection of breast cancer 
(Chapman et al 2007; Pinheiro et al 2010; Wandall et al 2010; 
Zhang and Tan, 2010; Blixt et al 2011; Lacombe et al 2013). 

The robustness of the study design, and the large number of sera 
screened gives us confidence of the validity of the results. 



The strengths of the study include (1) a microarray approach, 
which allowed simultaneous screening for autoantibodies to 
unglycosylated MUCl (consisting of three tandem repeats of 
MUCl), to MUCl 60mer glycopeptides and to recombinant 
MUCl produced in CHO cells and carrying no or defined 
O-linked glycans, (2) use of a prospective-specimen collection. 
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with retrospective -blinded evaluation study design (Pepe et at, 
2008), (3) validation of results on separate case-control sets 
including one from an independent serum bank, (4) additional 
controls from the Guernsey cohort with up to 32 years follow-up, 
(5) further evaluation of sera from individuals who later developed 
other cancers known to express MUCl, namely ovarian, pancreatic 
and lung, (6) matching for age and storage time of samples and 
(7) well balanced baseline characteristics between cases and 
controls. Limitations include the fact that sera were not available 
at the time of diagnosis from the cases studied in the present 
paper and the sera had been stored for a number of years 
before autoantibody determination. However, it is unlikely that 
this resulted in the lost of autoantibody activity as antibodies to 
p53 have been shown to be present (Pedersen et al, 2013) and in 
our previous study autoantibodies to MUCl glycopeptides 
were found in the sera from breast cancer patients after a 
storage time of 30 years (Blixt et a/, 2011). In addition, a significant 
proportion of the breast cancer cases would have been screen 
detected as a result of the national mammography screening 
programme and some of the ovarian cancer cases could also have 
been screen detected as UKCTOCS is an ovarian cancer screening 
trial. On the other hand, similar results were obtained when we 
used preclinical samples from other cancers especially lung and 
pancreas for which no screening was available in the UK. 

When determining the use of anti-MUCl antibodies for early 
detection or cancer risk, most previous studies have looked 
for antibodies in sera from cancer patients (Chapman et at, 2007; 
Desmetz et al, 2011; Pedersen et al, 2011) and extrapolated 
the results to suggest the assay's usefulness in early detection. 
We too have previously shown that more sera from stage I and II 
breast cancer patients contain autoantibodies compared with 
aged- matched controls and hypothesised that this might aid 
early detection (Blixt et al, 2011). This is in keeping with most 
biomarker discovery studies for early detection of cancer, 
which are usually carried out on sera collected from patients 
with clinical disease (Chapman et al, 2007; Zhong et al, 2008; 
Boyle et al, 2011; Lacombe et al, 2013), or small cohorts with 
lack of independent vaUdation of the findings (Lubin et al, 1995; 
Li et al, 2005; Robertson et al, 2005; Zhong et al, 2006; Pereira-Faca 
et al, 2007). There are only a few studies that have used a 
prospective sera collection (Pinheiro et al, 2010; Chapman et al, 
2012; Pedersen et al, 2013). The other study where preclinical 
cancer samples were screened for the presence of autoantibodies 
to the unglycosylated MUCl VNTR was the case- control study 
from the Nurses Health cohort involving sera from women who 
went on to develop ovarian cancer and healthy controls (Pinheiro 
et al, 2010). Autoantibodies to a MUCl tandem repeat peptide 
(consisting of five tandem repeats) were found to be associated 
with a lower risk of developing ovarian cancer in those under 64 
years of age and higher risk in women more than 64-years-old. 
However, the study only included 117 cases with only 27 over 64 
years of age, making data interpretation difficult. 

Our findings show the importance of validating initial findings 
in a larger sample set, as the trend towards more cases being in the 
highest quartile compared with controls observed in our discovery 
set was subsequently not validated in two independent sets. 

Our results are in contrast to results with p53 as autoantibodies 
to p53 were detected in sera from UKCTOCS women who went on 
to develop colon cancer (Pedersen et al, 2013), providing support 
for the fitness of the UKCTOCS serum bank samples for study of 
autoantibodies. There is considerable effort directed to developing 
a screen for antibodies to cancer antigens for individuals at high 
risk for lung cancer. In this context, antigen panels which can 
include, p53, 14-3-3, Annexin 1 or NY-ESO-1 show promise and 
are being evaluated in larger cohorts (Lubin et al, 1995; Li et al, 
2005; Pereira-Faca et al, 2007; Qui et al 2008; Boyle et al, 2011; 
Chapman et al, 2012). 



p53 is a nuclear protein, as are some of the other antigens 
showing promise as inducing autoantibodies before clinical 
diagnosis of cancer (Desmetz et al, 2011), while MUCl is a 
membrane antigen. It is not clear whether a difference in 
localisation could relate to the early induction of autoantibodies 
in cancer patients, unless there is a more stringent tolerance of the 
adaptive response to the surface molecules, requiring higher levels 
of membrane antigen. Certainly as long as the normal polarity of 
the epithelial cells is intact the MUCl glycoprotein will be on the 
luminal surface and less accessible to circulating immune cells. 
Moreover, while the change in glycosylation of MUCl is seen in 
early-stage cancers, (clinically diagnosed), the timing of this change 
in the initiation and progression to malignancy before clinical 
diagnosis is not known, and may correlate with a certain level of 
loss of ordered tissue architecture. Nonetheless, autoantibodies to 
MUCl do appear in the sera of a proportion of early-stage breast 
cancer patients at the time of diagnosis, whereas patients with 
benign breast disease have similar levels to controls (Blixt et al, 
2011). However, although the data from this study show that 
autoantibodies to MUCl may be useful for determining prognosis 
in women with early breast cancer, the results presented here show 
that an autoantibody profile to MUCl is unlikely to be useful 
as a screening test for cancer within the general population. 
A considerable amount of time and resources are devoted to 
developing MUCl -based autoantibody assays and our results 
suggest that these should be focused on other tumour-associated 
antigens, possibly nuclear antigens, for early cancer detection and 
risk stratification. 
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